Goto

Collaborating Authors

 transfer performance


Disentangling and mitigating the impact of task similarity for continual learning

Neural Information Processing Systems

Continual learning of partially similar tasks poses a challenge for artificial neural networks, as task similarity presents both an opportunity for knowledge transfer and a risk of interference and catastrophic forgetting.However, it remains unclear how task similarity in input features and readout patterns influences knowledge transfer and forgetting, as well as how they interact with common algorithms for continual learning.Here, we develop a linear teacher-student model with latent structure and show analytically that high input feature similarity coupled with low readout similarity is catastrophic for both knowledge transfer and retention. Conversely, the opposite scenario is relatively benign. Our analysis further reveals that task-dependent activity gating improves knowledge retention at the expense of transfer, while task-dependent plasticity gating does not affect either retention or transfer performance at the over-parameterized limit. In contrast, weight regularization based on the Fisher information metric significantly improves retention, regardless of task similarity, without compromising transfer performance. Nevertheless, its diagonal approximation and regularization in the Euclidean space are much less robust against task similarity. We demonstrate consistent results in a permuted MNIST task with latent variables. Overall, this work provides insights into when continual learning is difficult and how to mitigate it.


Improving Environment Novelty Quantification for Effective Unsupervised Environment Design

Neural Information Processing Systems

Unsupervised Environment Design (UED) formalizes the problem of autocur-ricula through interactive training between a teacher agent and a student agent. The teacher generates new training environments with high learning potential, curating an adaptive curriculum that strengthens the student's ability to handle unseen scenarios. Existing UED methods mainly rely on regret, a metric that measures the difference between the agent's optimal and actual performance, to




Appendix (LAION-5B: An open large-scale dataset for training next generation image-text models) A Datasheet for LAION-5B dataset A.1 Motivation Q1

Neural Information Processing Systems

For what purpose was the dataset created? Was there a specific task in mind? YFCC with 100 million image/videos and associated metadata. Who created the dataset (e.g., which team, research group) and on behalf of which Who funded the creation of the dataset? This work was sponsored by Hugging Face and Stability AI. What do the instances that comprise the dataset represent (e.g., documents, photos, Are there multiple types of instances (e.g., movies, users, and ratings; We provide 5.8 billion image-text pairs.


b6af2c9703f203a2794be03d443af2e3-Paper.pdf

Neural Information Processing Systems

In this work, we combine these observations to assess whether such trainable, transferrable subnetworks exist in pre-trained BERT models. For a range of downstream tasks, we indeed find matching subnetworks at 40% to 90% sparsity.


EmergentComplexityandZero-shotTransfervia UnsupervisedEnvironmentDesign

Neural Information Processing Systems

Awide range ofreinforcement learning (RL) problems --including robustness, transfer learning, unsupervised RL, and emergent complexity -- require specifying a distribution of tasks or environments in which a policy will be trained.